Claims 

What is claimed is: 

1. A computer-based method of retrieving one or more items from at least one 
database in response to a query specified by a user via at least one example set, the 

5 method comprising the steps of: 

constructing a scoring function from the at least one example set, wherein the 
scoring function is operable for use with a multidimensional indexing structure associated 
with the atTeast one database; and 

retrieving, via the multidimensional indexing structure, the one or more database 
10 items that have the highest score as computed using the scoring function. 

2. The method of claim 1, wherein the scoring function is constructed from 
multiple example sets wherein the multiple example sets comprise at least one of positive 
example sets and negative example sets. 

3. The method of claim 2, wherein the scoring function is constructed by 
1 5 combining respective scoring functions of the multiple example sets. 

4. The method of claim 3, wherein combining the scoring functions of the 
positive example sets and the negative example sets comprises the steps of: 

modifying the scoring functions of the positive example sets and the negative 
example sets so that the scoring functions of the positive example sets assign low scores 
20 to representative samples of the negative example sets, and so that the scoring functions 

of the negative example sets assign low scores to representative samples of the positive 
example sets; and 

combining the modified scoring functions of the positive example sets and the 
negative example sets. 
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5. The method of claim 2, wherein the scoring function assigns a score to an 
element of a search space associated with the at least one database equal to the minimum 
of: (i) the maximum of the scores assigned to the element by the scoring functions of the 
positive example sets; and (ii) the minimum of one minus the scores assigned to the 

5 element by the scoring functions of the negative example sets. 

6. The method of claim 1, wherein the scoring function gives higher scores to 
database items that are more closely related to the query than to database items that are 
not as closely related to the query. 

7. The method of claim 1, wherein a scoring function is obtained for the at least 
1 0 one example set by: 

computing a characteristic example from the at least one example set; 
computing a dispersion characterization of the at least one example set in 
association with the characteristic example; and 

using the characteristic example, the dispersion characterization, and one or more 
15 samples from the database to compute the scoring function. 

8. The method of claim 7, wherein the characteristic example is a centroid, a 
median, or a mode computed over at least a portion of the example set. 

9. The method of claim 7, wherein the dispersion characterization associated with 
the example set comprises a covariance matrix, a standard deviation, central moments, 

20 order statistics of differences, or third moments capturing asymmetry. 

10. The method of claim 7, wherein the step of using the characteristic example, 
the dispersion characterization, and one or more samples from the database to compute 
the scoring function further comprises computing weighted distances between the 
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characteristic example and the one or more samples from the database using the 
dispersion characterization to compute weights. 

11. The method of claim 10, wherein the weighted distances are Euclidean 
distances or Minkowsky distances. 

12. The method of claim 1, wherein a scoring function is computed for the at 
least one example set by converting one or more semi-metrics obtained using the at least 
one example set to one or more scores using a conversion function. 

13. The method of claim 12, wherein the semi-metrics are weighted Minkowsky 
distances from a representative sample of examples in the at least one example set, and 
further wherein weights are calculated using the examples in the at least one example set. 

14. The method of claim 13, wherein the weights are the inverse of standard 
deviations of the examples in the at least one example set. 

15. The method of claim 13, wherein the representative sample is a centroid of 
the examples in the at least one example set. 

16. The method of claim 12, wherein the conversion function is a monotonically 
non-increasing continuous function having a value equal to one at the origin and a value 
of zero at infinity. 

17. The method of claim 16, wherein the conversion function is a 
monotonically non-increasing continuous function having a value of one between zero 
and a first threshold value and a value of zero after a second threshold value. 
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18. The method of claim 1, wherein the user specifies the number of items to 
retrieve from the database. 

19. The method of claim 2, wherein the retrieving step further comprises the step 
of searching the multidimensional indexing structure to retrieve from the database the 
items having the highest score. 

20. The method of claim 19, wherein the multidimensional indexing structure is 
used to execute different queries. 

21. The method of claim 19, wherein the multidimensional indexing structure is 
based on a recursive partition of a search space associated with the database using 
hyperplanes parallel to coordinate axes or surfaces other than hyperplanes parallel to 
coordinate axes. 

22. The method of claim 19, wherein searching the multidimensional indexing 
structure comprises the steps of: 

using scoring functions of the multiple example sets to search a tree to identify 
candidate nodes; and 

using the scoring functions of the multiple example sets to score items stored at 
leaves of the multidimensional indexing structure. 

23. The method of claim 22, wherein identifying candidate nodes comprises the 
steps of: 

computing for each scoring function of the positive example sets, the maximum 
possible score of an item stored at the node or at one of the descendants of the node; 
computing the maximum of the maximum scores; 
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computing for each scoring function of the negative example sets, the minimum 
possible score of an item stored at the node or at one of the descendants of the node; 

computing the minimum of one minus the minimum scores; 

computing the minimum of: (i) the maximum of the maximum scores; and (ii) the 
5 minimum of one minus the minimum scores; 

comparing the computed minimum to the minimum of the scores in a current 
result set; 

declaring that a node is a candidate if the minimum is not smaller than the 
minimum of the scores in the current result set; and 
1 0 declaring that the node is not a candidate otherwise. 

24. The method of claim 22, wherein the search is performed by using the scoring 
functions of the positive example sets one at a time in conjunction with the scoring 
functions of the negative example sets. 

25. Apparatus for retrieving one or more items from at least one database in 
15 response to a query specified by a user via at least one example set, the apparatus 

comprising: 

at least one processor operative to: (i) construct a scoring function from the at 
least one example set, wherein the scoring function is operable for use with a 
multidimensional indexing structure associated with the at least one database; and (ii) 
20 retrieve, via the multidimensional indexing structure, the one or more database items that 

have the highest score as computed using the scoring function; and 

memory, coupled to the at least one processor, for storing at least a portion of 
results of one or more of the constructing and retrieving operations. 
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26. The apparatus of claim 25, wherein the scoring function is constructed from 
multiple example sets wherein the multiple example sets comprise at least one of positive 
example sets and negative example sets. 

27. The apparatus of claim 26, wherein the scoring function is constructed by 
5 combining respective scoring functions of the multiple example sets. 

28. The apparatus of claim 27, wherein combining the scoring functions of the 
positive example sets and the negative example sets comprises: (i) modifying the scoring 
functions of the positive example sets and the negative example sets so that the scoring 
functions of the positive example sets assign low scores to representative samples of the 

10 negative example sets, and so that the scoring functions of the negative example sets 
assign low scores to representative samples of the positive example sets; and (ii) 
combining the modified scoring functions of the positive example sets and the negative 
example sets. 

29. The apparatus of claim 26, wherein the scoring function assigns a score to an 
15 element of a search space associated with the at least one database equal to the minimum 

of: (i) the maximum of the scores assigned to the element by the scoring functions of the 
positive example sets; and (ii) the minimum of one minus the scores assigned to the 
element by the scoring functions of the negative example sets. 

30. The apparatus of claim 25, wherein the scoring function gives higher scores 
20 to database items that are more closely related to the query than to database items that are 

not as closely related to the query. 

31. The apparatus of claim 25, wherein a scoring function is obtained for the at 
least one example set by: (i) computing a characteristic example from the at least one 
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example set; (ii) computing a dispersion characterization of the at least one example set in 
association with the characteristic example; and (iii) using the characteristic example, the 
dispersion characterization, and one or more samples from the database to compute the 
scoring function. 

5 32. The apparatus of claim 31, wherein the characteristic example is a centroid, a 

median, or a mode computed over at least a portion of the example set. 

33. The apparatus of claim 31, wherein the dispersion characterization associated 
with the example set comprises a covariance matrix, a standard deviation, central 
moments, order statistics of differences, or third moments capturing asymmetry. 

10 34. The apparatus of claim 31, wherein the operation of using the characteristic 

example, the dispersion characterization, and one or more samples from the database to 
compute the scoring function further comprises computing weighted distances between 
the characteristic example and the one or more samples from the database using the 
dispersion characterization to compute weights. 

15 35. The apparatus of claim 34, wherein the weighted distances are Euclidean 

distances or Minkowsky distances. 

36. The apparatus of claim 25, wherein a scoring function is computed for the at 
least one example set by converting one or more semi-metrics obtained using the at least 
one example set to one or more scores using a conversion function. 

20 37. The apparatus of claim 36, wherein the semi-metrics are weighted 

Minkowsky distances from a representative sample of examples in the at least one 



YOR920000742US1 



34 



example set, and further wherein weights are calculated using the examples in the at least 
one example set. 

38. The apparatus of claim 37, wherein the weights are the inverse of standard 
deviations of the examples in the at least one example set. 

39. The apparatus of claim 37, wherein the representative sample is a centroid of 
the examples in the at least one example set. 

40. The apparatus of claim 36, wherein the conversion function is a 
monotonically non-increasing continuous function having a value equal to one at the 
origin and a value of zero at infinity. 

41. The apparatus of claim 40, wherein the conversion function is a 
monotonically non-increasing continuous function having a value of one between zero 
and a first threshold value and a value of zero after a second threshold value. 

42. The apparatus of claim 25, wherein the user specifies the number of items to 
retrieve from the database. 

43. The apparatus of claim 26, wherein the retrieving step further comprises the 
step of searching the multidimensional indexing structure to retrieve from the database 
the items having the highest score. 

44. The apparatus of claim 43, wherein the multidimensional indexing structure 
is used to execute different queries. 
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45. The apparatus of claim 43 , wherein the multidimensional indexing structure 
is based on a recursive partition of a search space associated with the database using 
hyperplanes parallel to coordinate axes or surfaces other than hyperplanes parallel to 
coordinate axes. 

5 46. The apparatus of claim 43, wherein searching the multidimensional indexing 

structure comprises: (i) using scoring functions of the multiple example sets to search a 
tree to identify candidate nodes; and (ii) using the scoring functions of the multiple 
example sets to score items stored at leaves of the multidimensional indexing structure. 

47. The apparatus of claim 46, wherein identifying candidate nodes comprises: (i) 
10 computing for each scoring function of the positive example sets, the maximum possible 

score of an item stored at the node or at one of the descendants of the node; (ii) 
computing the maximum of the maximum scores; (iii) computing for each scoring 
function of the negative example sets, the minimum possible score of an item stored at 
the node or at one of the descendants of the node; (iv) computing the minimum of one 

15 minus the minimum scores; (v) computing the minimum of: (a) the maximum of the 

maximum scores; and (b) the minimum of one minus the minimum scores; (vi) 
comparing the computed minimum to the minimum of the scores in a current result set; 
(vii) declaring that a node is a candidate if the minimum is not smaller than the minimum 
of the scores in the current result set; and (viii) declaring that the node is not a candidate 

20 otherwise. 

48. The apparatus of claim 46, wherein the search is performed by using the 
scoring functions of the positive example sets one at a time in conjunction with the 
scoring functions of the negative example sets. 
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49. An article of manufacture for retrieving one or more items from at least one 
database in response to a query specified by a user via at least one example set, the article 
comprising a machine readable medium containing one or more programs which when 
executed implement the steps of: 
5 constructing a scoring function from the at least one example set, wherein the 

scoring function is operable for use with a multidimensional indexing structure associated 
with the at least one database; and 

retrieving, via the multidimensional indexing structure, the one or more database 
items that have the highest score as computed using the scoring function. 

10 
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